-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38104][table] add table api support for model ml_predict #27108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/Model.java
Outdated
Show resolved
Hide resolved
074a935 to
982f128
Compare
fsk119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. I left some comments.
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/Model.java
Outdated
Show resolved
Hide resolved
...-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/internal/ModelImpl.java
Outdated
Show resolved
Hide resolved
...-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/internal/ModelImpl.java
Outdated
Show resolved
Hide resolved
...-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/internal/ModelImpl.java
Show resolved
Hide resolved
...k-table-api-java/src/main/java/org/apache/flink/table/api/internal/TableEnvironmentImpl.java
Show resolved
Hide resolved
...able-api-java/src/main/java/org/apache/flink/table/expressions/ModelReferenceExpression.java
Show resolved
Hide resolved
...able-api-java/src/main/java/org/apache/flink/table/expressions/ModelReferenceExpression.java
Show resolved
Hide resolved
...nk-table-api-java/src/main/java/org/apache/flink/table/expressions/ApiExpressionVisitor.java
Show resolved
Hide resolved
...table-planner/src/main/java/org/apache/flink/table/planner/plan/QueryOperationConverter.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/MLPredictTestPrograms.java
Show resolved
Hide resolved
824f5f1 to
208c58d
Compare
fsk119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments
| throw new ValidationException("Anonymous models cannot be serialized."); | ||
| } | ||
|
|
||
| return "MODEL " + model.getIdentifier().asSerializableString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the name is lost after searilization. I am not sure whether we need to restore the object from the string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name are not used for TableReferenceExpression and literal argument name as well. These are not serialized as named argument call. I don't see this is restored from serialized string, looks mainly to convert it sql query looks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need add a case to veriy:
- whether we can store a call with model reference as a view via
org.apache.flink.table.api.TableEnvironment#createView(java.lang.String, org.apache.flink.table.api.Table) - whether we can restore the view and run tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a IT test for 1. I'm not sure what you mean for 2. Do you mean table api restore test? I don't see restore test support table api though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a test to QueryOperationSqlSerializationTest.
...able-api-java/src/main/java/org/apache/flink/table/expressions/ModelReferenceExpression.java
Show resolved
Hide resolved
.../src/main/java/org/apache/flink/table/planner/expressions/converter/ExpressionConverter.java
Outdated
Show resolved
Hide resolved
| } | ||
| final RexTableArgCall tableArgCall = | ||
| new RexTableArgCall(rowType, inputStack.size(), partitionKeys, new int[0]); | ||
| inputStack.add(relBuilder.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maintain the stack and do the relBuilder.build() in the QueryOperationConverter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to do since inputStack depends on this call. Maybe that's why TableReferenceExpression was handled in QueryOperationConverter in the first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do like this?
Collections.reverse(resolvedArgs);
final List<RelNode> inputStack = new ArrayList<>();
final List<RexNode> rexNodeArgs =
resolvedArgs.stream()
.map(QueryOperationConverter.this::convertExprToRexNode)
.peek(
expr -> {
if (expr instanceof RexTableArgCall) {
inputStack.add(relBuilder.build());
}
})
.collect(Collectors.toList());
Collections.reverse(rexNodeArgs);
And in the ExpressionConverter:
@Override
public RexNode visit(TableReferenceExpression tableRef) {
final LogicalType tableArgType = tableRef.getOutputDataType().getLogicalType();
final RelDataType rowType = typeFactory.buildRelNodeRowType((RowType) tableArgType);
final int[] partitionKeys;
if (tableRef.getQueryOperation() instanceof PartitionQueryOperation) {
final PartitionQueryOperation partitionOperation =
(PartitionQueryOperation) tableRef.getQueryOperation();
partitionKeys = partitionOperation.getPartitionKeys();
} else {
partitionKeys = new int[0];
}
final RexTableArgCall tableArgCall =
new RexTableArgCall(rowType, relBuilder.size() - 1, partitionKeys, new int[0]);
// inputStack.add(relBuilder.build());
return tableArgCall;
}
After the change, I find no test in ProcessTableFunctionSemanticTests fails.
I think the logic for ptf(table t1, table t2, arg) is:
- When building the FunctionQueryOperation, the planner first pushes
all operands used in the RelNode tree onto the stack. As a result,
the stack head is t2, followed by t1. - We then parse the operands in reverse order. For operand table t2,
the current stack size is 2, so we can build a RexTableArgCall to
refer to the stack head. After visiting, we pop the stack head.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks we still need to special handle RexTableArgCall here, how about we leave original logic for TableReferenceExpression and just move ModelReferenceExpression to ExpressionConverter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay.. Can you open a ticket to track this problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flink-python/pyflink/table/tests/test_table_environment_completeness.py
Outdated
Show resolved
Hide resolved
fsk119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
twalthr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @lihaosky. I added a last set of comments to ensure the code stays consistent with other locations.
| * | ||
| * <p>The data type will be {@link DataTypes#DESCRIPTOR()}. | ||
| */ | ||
| public static ApiExpression descriptor(ColumnList columnList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is method is unnecessary. It is like "Descriptor in a descriptor". You can just directly pass ColumnList to any method that takes expressions.
| * @param modelPath The path of the model in the catalog. | ||
| * @return The {@link Model} object describing the model resource. | ||
| */ | ||
| Model fromModelPath(String modelPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm still wondering whether we should call this fromModel. It would align better with fromCall.
| * @param descriptor The {@link ModelDescriptor} describing the model resource. | ||
| * @return The {@link Model} object representing the model resource. | ||
| */ | ||
| Model from(ModelDescriptor descriptor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this fromModel to distinguish between the heavily overloaded from() method. We should let from() be reserved for tables.
| // lit() is not serializable to sql. | ||
| if (options.isEmpty()) { | ||
| return tableEnvironment.fromCall( | ||
| BuiltInFunctionDefinitions.ML_PREDICT.getName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to call getName here, ideally we can just use ApiExpressionUtils.unresolvedCall and operationTreeBuilder and pass the FunctionDefinition. So one level deeper. this avoids a catalog lookup for the function name. catalog lookups are expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableEnvironment.fromCall has additional checks like
tableReferenceChecker.check(arguments);
which seems useful though
...k-table-api-java/src/main/java/org/apache/flink/table/api/internal/TableEnvironmentImpl.java
Show resolved
Hide resolved
| } | ||
|
|
||
| @Override | ||
| public org.apache.flink.table.api.Model fromModel(String modelPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve all imports in this class:
| public org.apache.flink.table.api.Model fromModel(String modelPath) { | |
| public Model fromModel(String modelPath) { |
| flinkContext.getClassLoader(), | ||
| contextResolvedModel.isTemporary()); | ||
| final LogicalType modelOutputType = | ||
| DataTypeUtils.fromResolvedSchemaPreservingTimeAttributes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time attributes are not a topic for models
| public abstract class FunctionCallUtil { | ||
|
|
||
| private static final String CONFIG_ERROR_MESSAGE = | ||
| "Config parameter should be a MAP data type consisting String literals."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Config parameter should be a MAP data type consisting String literals."; | |
| "Config parameter should be a MAP data type consisting of string literals."; |
| .setupTableSink(SIMPLE_SINK) | ||
| .setupConfig( | ||
| ExecutionConfigOptions.TABLE_EXEC_ASYNC_ML_PREDICT_OUTPUT_MODE, | ||
| ExecutionConfigOptions.AsyncOutputMode.ALLOW_UNORDERED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: simplify code
| ExecutionConfigOptions.AsyncOutputMode.ALLOW_UNORDERED) | |
| AsyncOutputMode.ALLOW_UNORDERED) |
| } | ||
|
|
||
| @Test | ||
| public void testPredictTableApiWithView() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test still necessary after all semantic tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed one of Shengkai's comment to test createView table api
What is the purpose of the change
Add table api support for Model and ml_predict function in https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API
Brief change log
Modelinterface andModelImplimplementation for model and ml_predictfromModelPathandfromto constructModelfromTableEnvironmentModelReferenceExpressionand handle it inQueryOperationConverterContextResolvedModelVerifying this change
Unit and Integration test
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes)Documentation